$$ \def\eb{\boldsymbol{e}} \def\fb{\boldsymbol{f}} \def\hb{\boldsymbol{h}} \def\xb{\boldsymbol{x}} \def\Rb{\boldsymbol{R}} \def\Real{\mathbb{R}} \def\bfzero{\boldsymbol{0}} \newcommand{\ddy}[2]{\frac{\partial{#1}}{\partial{#2}}} $$
The goal of this chapter is to explore and begin to answer the following question:
How do we represent mathematical functions on a computer?
Polynomial Interpolation: Motivation
If \(f\) is a polynomial of degree \(n\), \[ f(x) = p_n(x) = a_0 + a_1x + \ldots + a_nx^n, \] then we only need to store the \(n+1\) coefficients \(a_0,\ldots,a_n\). Operations such as taking the derivative or integrating \(f\) are also convenient. The idea in this chapter is to find a polynomial that approximates a general function \(f\). For a continuous function \(f\) on a bounded interval, this is always possible if you take a high enough degree polynomial:
Theorem 2.1: Weierstrass Approximation Theorem (1885)
This may be proved using an explicit sequence of polynomials, called Bernstein polynomials.
If \(f\) is not continuous, then something other than a polynomial is required, since polynomials can’t handle asymptotic behaviour.
To approximate functions like \(1/x\), there is a well-developed theory of rational function interpolation, which is beyond the scope of this course.
In this chapter, we look for a suitable polynomial \(p_n\) by interpolation—that is, requiring \(p_n(x_i) = f(x_i)\) at a finite set of points \(x_i\), usually called nodes. Sometimes we will also require the derivative(s) of \(p_n\) to match those of \(f\). This type of function approximation where we want to match values of the function that we know at particular points is very natural in many applications. For example, weather forecasts involve numerically solving huge systems of partial differential equations (PDEs), which means actually solving them on a discrete grid of points. If we want weather predictions between grid points, we must interpolate. Figure Figure 2.1 shows the spatial resolutions of a range of current and past weather models produced by the UK Met Office.
Taylor series
A truncated Taylor series is (in some sense) the simplest interpolating polynomial since it uses only a single node \(x_0\), although it does require \(p_n\) to match both \(f\) and some of its derivatives.
We can approximate this using a Taylor series about the point \(x_0=0\), which is \[ \sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \ldots. \] This comes from writing \[ f(x) = a_0 + a_1(x-x_0) + a_2(x-x_0)^2 + \ldots, \] then differentiating term-by-term and matching values at \(x_0\): \[\begin{align*} f(x_0) &= a_0,\\ f'(x_0) &= a_1,\\ f''(x_0) &= 2a_2,\\ f'''(x_0) &= 3(2)a_3,\\ &\vdots\\ \implies f(x) &= f(x_0) + f'(x_0)(x-x_0) + \frac{f''(x_0)}{2!}(x-x_0)^2 + \frac{f'''(x_0)}{3!}(x-x_0)^3 + \ldots. \end{align*}\] So \[\begin{align*} \textrm{1 term} \;&\implies\; f(0.1) \approx 0.1,\\ \textrm{2 terms} \;&\implies\; f(0.1) \approx 0.1 - \frac{0.1^3}{6} = 0.099833\ldots,\\ \textrm{3 terms} \;&\implies\; f(0.1) \approx 0.1 - \frac{0.1^3}{6} + \frac{0.1^5}{120} = 0.09983341\ldots.\\ \end{align*}\] The next term will be \(-0.1^7/7! \approx -10^{-7}/10^3 = -10^{-10}\), which won’t change the answer to 6 s.f.
The exact answer is \(\sin(0.1)=0.09983341\).
Mathematically, we can write the remainder as follows.
Theorem 2.2: Taylor’s Theorem
The sum is called the Taylor polynomial of degree \(n\), and the last term is called the Lagrange form of the remainder. Note that the unknown number \(\xi\) depends on \(x\).
For \(f(x)=\sin(x)\), we found the Taylor polynomial \(p_6(x) = x - x^3/3! + x^5/5!\), and \(f^{(7)}(x)=-\sin(x)\). So we have \[ \big|f(x) - p_6(x)\big| = \left|\frac{f^{(7)}(\xi)}{7!}(x-x_0)^7\right| \] for some \(\xi\) between \(x_0\) and \(x\). For \(x=0.1\), we have \[ \big|f(0.1) - p_6(0.1)\big| = \frac{1}{5040}(0.1)^7\big|f^{(7)}(\xi)\big| \quad \textrm{for some $\xi\in[0,0.1]$}. \] Since \(\big|f^{(7)}(\xi)\big| = \big|\sin(\xi)\big| \leq 1\), we can say, before calculating, that the error satisfies \[ \big|f(0.1) - p_6(0.1)\big| \leq 1.984\times 10^{-11}. \]
The actual error is \(1.983\times 10^{-11}\), so this is a tight estimate.
Since this error arises from approximating \(f\) with a truncated series, rather than due to rounding, it is known as truncation error. Note that it tends to be lower if you use more terms (larger \(n\)), or if the function oscillates less (smaller \(f^{(n+1)}\) on the interval \((x_0,x)\)).
Error estimates like the Lagrange remainder play an important role in numerical analysis and computation, so it is important to understand where it comes from. The number \(\xi\) will ultimately come from Rolle’s theorem, which is a special case of the mean value theorem from first-year calculus:
Theorem 2.3: Rolle’s Theorem
Note that Rolle’s Theorem does not tell us what the value of \(\xi\) might actually be, so in practice we must take some kind of worst case estimate to get an error bound, e.g. calculate the max value of \(f'(\xi)\) over the range of possible \(\xi\) values.
2.1 Polynomial interpolation
The classical problem of polynomial interpolation is to find a polynomial \[ p_n(x) = a_0 + a_1x + \ldots + a_n x^n = \sum_{k=0}^n a_k x^k \] that interpolates our function \(f\) at a finite set of nodes \(\{x_0, x_1, \ldots, x_m\}\). In other words, \(p_n(x_i)=f(x_i)\) at each of the nodes \(x_i\). Since the polynomial has \(n+1\) unknown coefficients, we expect to need \(n+1\) distinct nodes, so let us assume that \(m=n\).
Here we have two nodes \(x_0\), \(x_1\), and seek a polynomial \(p_1(x) = a_0 + a_1x\). Then the interpolation conditions require that \[ \begin{cases} p_1(x_0) = a_0 + a_1x_0 = f(x_0)\\ p_1(x_1) = a_0 + a_1x_1 = f(x_1) \end{cases} \implies\quad p_1(x) = \frac{x_1f(x_0) - x_0f(x_1)}{x_1 - x_0} + \frac{f(x_1) - f(x_0)}{x_1 - x_0}x. \]
For general \(n\), the interpolation conditions require \[ \begin{matrix} a_0 &+ a_1x_0 &+ a_2x_0^2 &+ \ldots &+ a_nx_0^n &= f(x_0),\\ a_0 &+ a_1x_1 &+ a_2x_1^2 &+ \ldots &+ a_nx_1^n &= f(x_1),\\ \vdots & \vdots & \vdots & &\vdots & \vdots\\ a_0 &+ a_1x_n &+ a_2x_n^2 &+ \ldots &+ a_nx_n^n &= f(x_n), \end{matrix} \] so we have to solve \[ \begin{pmatrix} 1 & x_0 & x_0^2 & \cdots & x_0^n\\ 1 & x_1 & x_1^2 & \cdots & x_1^n\\ \vdots & \vdots &\vdots& & \vdots\\ 1 & x_n & x_n^2 & \cdots & x_n^n\\ \end{pmatrix} \begin{pmatrix} a_0\\ a_1\\ \vdots\\ a_n \end{pmatrix} = \begin{pmatrix} f(x_0)\\ f(x_1)\\ \vdots\\ f(x_n) \end{pmatrix}. \] This is called a Vandermonde matrix. The determinant of this matrix is \[ \det(A) = \prod_{0\leq i < j\leq n} (x_j - x_i), \] which is non-zero provided the nodes are all distinct. This establishes an important result, where \(\mathcal{P}_n\) denotes the space of all real polynomials of degree \(\leq n\).
Theorem 2.4: Existence/uniqueness
We may also prove uniqueness by the following elegant argument.
Proof (Uniqueness part of Existence/Uniqueness Theorem):
Suppose that in addition to \(p_n\) there is another interpolating polynomial \(q_n\in\mathcal{P}_n\). Then the difference \(r_n := p_n - q_n\) is also a polynomial with degree \(\leq n\). But we have \[
r_n(x_i) = p_n(x_i) - q_n(x_i) = f(x_i)-f(x_i)=0 \quad \textrm{for $i=0,\ldots,n$},
\] so \(r_n(x)\) has \(n+1\) roots. From the Fundamental Theorem of Algebra, this is possible only if \(r_n(x)\equiv 0\), which implies that \(q_n=p_n\).
Note that the unique polynomial through \(n+1\) points may have degree \(< n\). This happens when \(a_0=0\) in the solution to the Vandermonde system above.
We have \(x_0=0\), \(x_1=\tfrac{\pi}{2}\), \(x_2=\pi\), so \(f(x_0)=1\), \(f(x_1)=0\), \(f(x_2)=-1\). Clearly the unique interpolant is a straight line \(p_2(x) = 1 - \tfrac2\pi x\).
If we took the nodes \(\{0,2\pi,4\pi\}\), we would get a constant function \(p_2(x)=1\).
One way to compute the interpolating polynomial would be to solve the Vandermonde system above, e.g. by Gaussian elimination. However, this is not recommended. In practice, we choose a different basis for \(p_n\); there are two common and effective choices due to Lagrange and Newton.
The Vandermonde matrix arises when we write \(p_n\) in the natural basis \(\{1,x,x^2,\ldots\}\), but we could also choose to work in some other basis…
2.1.1 Lagrange Polynomials
This uses a special basis of polynomials \(\{\ell_k\}\) in which the interpolation equations reduce to the identity matrix. In other words, the coefficients in this basis are just the function values, \[ p_n(x) = \sum_{k=0}^n f(x_k)\ell_k(x). \]
Example 2.1: Linear interpolation again.
For general \(n\), the \(n+1\) Lagrange polynomials are defined as a product \[ \ell_k(x) = \prod_{\substack{j=0\\j\neq k}}^n\frac{x - x_j}{x_k - x_j}. \] By construction, they have the property that \[ \ell_k(x_i) = \begin{cases} 1 & \textrm{if $i=k$},\\ 0 & \textrm{otherwise}. \end{cases} \] From this, it follows that the interpolating polynomial may be written as above.
By the Existence/Uniqueness Theorem, the Lagrange polynomials are the unique polynomials with this property.
Example 2.2: Compute the quadratic interpolating polynomial to \(f(x)=\cos(x)\) with nodes \(\{-\tfrac\pi4, 0, \tfrac\pi4\}\) using Lagrange polynomials.
Lagrange polynomials were actually discovered by Edward Waring in 1776 and rediscovered by Euler in 1783, before they were published by Lagrange himself in 1795; a classic example of Stigler’s law of eponymy!
The Lagrange form of the interpolating polynomial is easy to write down, but expensive to evaluate since all of the \(\ell_k\) must be computed. Moreover, changing any of the nodes means that the \(\ell_k\) must all be recomputed from scratch, and similarly for adding a new node (moving to higher degree).
2.1.2 Newton/Divided-Difference Polynomials
It would be easy to increase the degree of \(p_n\) if \[ p_{n+1}(x) = p_{n}(x) + g_{n+1}(x), \quad \textrm{where $g_{n+1}\in{\cal P}_{n+1}$}. \] From the interpolation conditions, we know that \[ g_{n+1}(x_i) = p_{n+1}(x_i) - p_{n}(x_i) = f(x_i)-f(x_i)=0 \quad \textrm{for $i=0,\ldots,n$}, \] so \[ g_{n+1}(x) = a_{n+1}(x-x_0)\cdots(x-x_{n}). \] The coefficient \(a_{n+1}\) is determined by the remaining interpolation condition at \(x_{n+1}\), so \[ p_n(x_{n+1}) + g_{n+1}(x_{n+1}) = f(x_{n+1}) \quad \implies \quad a_{n+1} = \frac{f(x_{n+1}) - p_{n}(x_{n+1})}{(x_{n+1}-x_0)\cdots(x_{n+1}-x_{n})}. \]
The polynomial \((x-x_0)(x-x_1)\cdots(x-x_{n})\) is called a Newton polynomial. These form a new basis \[ n_0(x)=1, \qquad n_k(x) = \prod_{j=0}^{k-1}(x-x_j) \quad \textrm{for $k>0$}. \]
The Newton form of the interpolating polynomial is then \[ p_n(x) = \sum_{k=0}^n a_kn_k(x), \qquad a_0 = f(x_0),\qquad a_k = \frac{f(x_k) - p_{k-1}(x_k)}{(x_k-x_0)\cdots(x_k-x_{k-1})} \textrm{ for $k>0$}. \] Notice that \(a_k\) depends only on \(x_0,\ldots x_k\), so we can construct first \(a_0\), then \(a_1\), etc.
It turns out that the \(a_k\) are easy to compute, but it will take a little work to derive the method. We define the divided difference \(f[x_0,x_1,\ldots,x_k]\) to be the coefficient of \(x^k\) in the polynomial interpolating \(f\) at nodes \(x_0,\ldots,x_k\). It follows that \[ f[x_0,x_1,\ldots,x_k] = a_k, \] where \(a_k\) is the coefficient in the Newton form above.
Example 2.3: Compute the Newton interpolating polynomial at two nodes.
Example 2.4: Compute the Newton interpolating polynomial at three nodes.
In general, we have the following.
Theorem 2.5
Proof:
Without loss of generality, we relabel the nodes so that \(i=0\). So we want to prove that \[
f[x_0,x_1,\ldots,x_{k}] = \frac{f[x_1,\ldots,x_{k}] - f[x_0,\ldots,x_{k-1}]}{x_{k} - x_0}.
\] The trick is to write the interpolant with nodes \(x_0, \ldots, x_k\) in the form \[
p_k(x) = \frac{(x_k-x)q_{k-1}(x) + (x-x_0)\tilde{q}_{k-1}(x)}{x_k-x_0},
\] where \(q_{k-1}\in{\cal P}_{k-1}\) interpolates \(f\) at the subset of nodes \(x_0, x_1, \ldots, x_{k-1}\) and \(\tilde{q}_{k-1}\in{\cal P}_{k-1}\) interpolates \(f\) at the subset \(x_1, x_2,\ldots,x_k\). If this holds, then matching the coefficient of \(x^k\) on each side will give the divided difference formula, since, e.g., the leading coefficient of \(q_{k-1}\) is \(f[x_0,\ldots,x_{k-1}]\). To see that \(p_k\) may really be written this way, note that \[
\begin{aligned}
p_k(x_0) &= q_{k-1}(x_0) = f(x_0),\\
p_k(x_k) &= \tilde{q}_{k-1}(x_k) = f(x_k),\\
p_k(x_i) &= \frac{(x_k-x_i)q_{k-1}(x_i) + (x_i-x_0)\tilde{q}_{k-1}(x_i)}{x_k - x_0} = f(x_i) \quad \textrm{for $i=1,\ldots,k-1$}.
\end{aligned}
\] Since \(p_k\) agrees with \(f\) at the \(k+1\) nodes, it is the unique interpolant in \({\cal P}_k\).
Theorem above gives us our convenient method, which is to construct a divided-difference table.
Example 2.5: Construct the Newton polynomial at the nodes \(\{-1,0,1,2\}\) and with corresponding function values \(\{5,1,1,11\}\)
Notice that the \(x^5\) coefficient vanishes for these particular data, meaning that they are consistent with \(f\in{\cal P}_4\).
Note that the value of \(f[x_0,x_1,\ldots,x_k]\) is independent of the order of the nodes in the table. This follows from the uniqueness of \(p_k\).
Divided differences are actually approximations for derivatives of \(f\). In the limit that the nodes all coincide, the Newton form of \(p_n(x)\) becomes the Taylor polynomial.
2.1.3 Interpolation Error
The goal here is to estimate the error \(|f(x)-p_n(x)|\) when we approximate a function \(f\) by a polynomial interpolant \(p_n\). Clearly this will depend on \(x\).
Example 2.6: Quadratic interpolant for \(f(x)=\cos(x)\) with \(\{-\tfrac\pi4,0,\tfrac\pi4\}\).
Clearly the error vanishes at the nodes themselves, but note that it generally does better near the middle of the set of nodes — this is quite typical behaviour.
We can adapt the proof of Taylor’s theorem to get a quantitative error estimate.
Theorem 2.6: Cauchy’s Interpolation Error Theorem
This looks similar to the error formula for Taylor polynomials (see Taylor’s Theorem). But now the error vanishes at multiple nodes rather than just at \(x_0\).
From the formula, you can see that the error will be larger for a more “wiggly” function, where the derivative \(f^{(n+1)}\) is larger. It might also appear that the error will go down as the number of nodes \(n\) increases; we will see in Section 2.1.4 that this is not always true.
As in Taylor’s theorem, note the appearance of an undetermined point \(\xi\). This will prevent us knowing the error exactly, but we can make an estimate as before.
Example 2.7: Quadratic interpolant for \(f(x)=\cos(x)\) with \(\{-\tfrac\pi4,0,\tfrac\pi4\}\).
For an upper bound on the error at a particular \(x\), we can just use \(|\sin(\xi)|\leq 1\) and plug in \(x\).
To bound the maximum error within the interval \([-1,1]\), let us maximise the polynomial \(w(x)=x(x+\tfrac\pi4)(x-\tfrac\pi4)\). We have \(w'(x) = 3x^2 - \tfrac{\pi^2}{16}\) so turning points are at \(x=\pm\tfrac\pi{4\sqrt{3}}\). We have \[ w(-\tfrac\pi{4\sqrt{3}}) = 0.186\ldots, \quad w(\tfrac\pi{4\sqrt{3}}) = -0.186\ldots, \quad w(-1) = -0.383\ldots, \quad w(1) = 0.383\ldots. \]
So our error estimate for \(x\in[-1,1]\) is \[ |f(x) - p_2(x)| \leq \tfrac16(0.383) = 0.0638\ldots \]
From the plot earlier, we see that this bound is satisfied (as it has to be), although not tight.
2.1.4 Node Placement: Chebyshev nodes
You might expect polynomial interpolation to converge as \(n\to\infty\). Surprisingly, this is not the case if you take equally-spaced nodes \(x_i\). This was shown by Runge in a famous 1901 paper.
Example 2.8: The Runge function \(f(x) = 1/(1 + 25x^2)\) on \([-1,1]\).
Notice that \(p_n\) is converging to \(f\) in the middle, but diverging more and more near the ends, even within the interval \([x_0,x_n]\). This is called the Runge phenomenon.
A full mathematical explanation for this divergence usually uses complex analysis — see Chapter 13 of Approximation Theory and Approximation Practice by L.N. Trefethen (SIAM, 2013). For a more elementary proof, see this StackExchange post.
The problem is (largely) coming from the interpolating polynomial \[ w(x) = \prod_{i=0}^n(x-x_i). \] We can avoid the Runge phenomenon by choosing different nodes \(x_i\) that are not uniformly spaced.
Since the problems are occurring near the ends of the interval, it would be logical to put more nodes there. A good choice is given by taking equally-spaced points on the unit circle \(|z|=1\), and projecting to the real line:
The points around the circle are \[ \phi_j = \frac{(2j+1)\pi}{2(n+1)}, \quad j=0,\ldots,n, \] so the corresponding Chebyshev nodes are \[ x_j = \cos\left[\frac{(2j + 1)\pi}{2(n+1)}\right], \quad j=0,\ldots,n. \]
Example 2.9: The Runge function \(f(x) = 1/(1+25x^2)\) on \([-1,1]\) using the Chebyshev nodes.
Below we illustrate the resulting interpolant for \(n=15\):
Compare this to the example with equally spaced nodes.
In fact, the Chebyshev nodes are, in one sense, an optimal choice. To see this, we first note that they are zeroes of a particular polynomial.
The Chebyshev points \(x_j=\cos\left[\frac{(2j+1)\pi}{2(n+1)}\right]\) for \(j=0,\ldots,n\) are zeroes of the Chebyshev polynomial \[ T_{n+1}(t) := \cos\big[(n+1)\arccos(t)\big] \]
The Chebyshev polynomials are denoted \(T_n\) rather than \(C_n\) because the name is transliterated from Russian as “Tchebychef” in French, for example.
In choosing the Chebyshev nodes, we are choosing the error polynomial \(w(x):=\prod_{i=0}^n(x-x_i)\) to be \(T_{n+1}(x)/2^n\). (This normalisation makes the leading coefficient 1) This is a good choice because of the following result.
Theorem 2.7: Chebyshev interpolation
Having established that the Chebyshev polynomial minimises the maximum error, we can see convergence as \(n\to\infty\) from the fact that \[ |f(x) - p_n(x)| = \frac{|f^{(n+1)}(\xi)|}{(n+1)!}|w(x)| = \frac{|f^{(n+1)}(\xi)|}{2^n(n+1)!}|T_{n+1}(x)| \leq \frac{|f^{(n+1)}(\xi)|}{2^n(n+1)!}. \]
If the function is well-behaved enough that \(|f^{(n+1)}(x)| < M\) for some constant whenever \(x \in [-1,1]\), then the error will tend to zero as \(n \to \infty\).